Method for preserving memory affinity in a non-uniform memory access data processing system

ABSTRACT

A method for preserving memory affinity in a computer system is disclosed. The method reduces and sometimes eliminates memory affinity loss due to process migration by restoring the proper memory affinity through dynamic page migration. The memory affinity access patterns of individual pages are tracked continuously. If a particular page is found almost always to be accessed from a particular remote access affinity domain for a certain number of times, and without any intervening requests from other access affinity domain, the page will migrate to that particular remote affinity domain so that the subsequent memory access becomes local memory access. As a result, the proper pages are migrated to increase memory affinity.

PRIORITY CLAIM

The present application is a continuation of U.S. patent applicationSer. No. 13/015,733, filed on Jan. 28, 2011, the contents of which areincorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Technical Field

The present disclosure relates to non-uniform memory access dataprocessing systems in general, and in particular to a method forpreserving memory affinity in a non-uniform memory access dataprocessing system.

2. Description of Related Art

Generally speaking, the performance of a computer system largely dependson the execution speed of system and application programs. Duringprogram execution, both instructions and data need to be fetched from asystem memory. While the frequency of memory access has been greatlyreduced via the utilization of a cache hierarchy, system memory accessesafter cache misses still account for a significant portion of programexecution time.

The disparity between program execution time and memory access timecontinues to increase even with various improvements in computerhardware technology. In fact, while program execution time decreaseswhen processor frequency increases, as expected, the number of processorcycles needed to retrieve data from a system memory effectivelyincreases. For example, when the clock frequency of a processor isdoubled, the execution time of an integer instruction is likely to bereduced by half, but the number of processor clocks for accessing amemory may actually be doubled. In addition, memory speed has not beenkeeping up with the processor clock speed. For example, processor clockspeed had increased about 60% to 100% from one processor generation toanother while memory speed had increased only 25% within the same timeframe.

One way to shorten memory access time is to place a system memory asclose to processors as possible physically. But in a large serversystem, it is difficult to position the system memory in the idealproximity to processors under the form factor of the server system,which leads to varying latencies to access different regions of thesystem memory. Thus, large server systems tend to use a distributedmemory model known as non-uniform memory access (NUMA). One challengefor a NUMA computer system is to maintain high memory affinity tovarious processors where threads/processes are being executed. Highmemory affinity implies that blocks or pages of the system memory thatare used local to a processor are positioned in a memory region close tothe processor.

Currently, an operating system can start a program with a high memoryaffinity by allocating newly accessed pages in a local memory affinitydomain, i.e., in a local memory or a memory having minimal latency. Thisstrategy, however, cannot cope with changes in memory affinity stemmedfrom certain operations initiated by the operating system.

For example, for load balancing purposes, processes may have to bemigrated from heavily utilized processors to less utilized ones. Also,in order to decrease power consumption, processor folding operations canbe utilized to force process migration for freeing and powering downsome processors when the system load decreases. Process migration canalso occur when system load increases, which may result in processorunfolding to spread out the increased workload to more processors. Allthese dynamically occurring process migration can cause a loss in memoryaffinity, which can lead to various degrees of performance degradationdue to an increase in remote memory accesses.

One prior art solution for preserving memory affinity is by banningprocess migration completely. This strategy can certainly reduce thelikelihood of losing memory affinity, but at the expense of forgoing theflexibility of the system to perform proper load balancing and/orprocessor folding Importantly, even with this drastic measure, a systemstill may not be able to cope with a shift of memory affinity due todynamically changing access patterns. This can happen, for example, whena page is shared by processors from multiple affinity domains, and atdifferent computational phases a different processor becomes thedominant source of access to the page.

Another prior art solution is to migrate pages along with a processmigration. This solution triggers the problem of not knowing which pagesto migrate with the job and sometimes wrong pages may be migrated, whichwill actually reduce memory affinity system-wide. This problem isparticularly bad for pages that are shared among processes migrating todifferent computing resources.

Consequently, it would be desirable to provide an improved method forpreserving memory affinity in a NUMA computer system.

SUMMARY OF THE INVENTION

In accordance with a preferred embodiment of the present disclosure, inresponse to a request for memory access to a page within a memoryaffinity domain, a determination is made whether or not the request isinitiated by a processor associated with the memory affinity domain. Ifthe request is not initiated by a processor associated with the memoryaffinity domain, a determination is made whether or not there is a pageID match with an entry within a page migration tracking moduleassociated with the memory affinity domain. If there is no page ID matchwith any entry within the page migration tracking module, an entry isselected within the page migration tracking module to be updated with anew page ID and a new memory affinity ID. If there is a page ID matchwith an entry within the page migration tracking module, then anotherdetermination is made whether or not there is a memory affinity ID matchwith the entry with the page ID field match. If there is no memoryaffinity ID match, the entry with the page ID field match is updatedwith a new memory affinity ID. If there is a memory affinity ID match,an access counter of the entry with the page ID field match isincremented.

All features and advantages of the present disclosure will becomeapparent in the following detailed written description.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure itself, as well as a preferred mode of use, furtherobjects, and advantages thereof, will best be understood by reference tothe following detailed description of an illustrative embodiment whenread in conjunction with the accompanying drawings, wherein:

FIG. 1 is a block diagram of a shared-memory multiprocessor system, inwhich an embodiment of the present invention may be implemented;

FIG. 2 is a block diagram of a page migration tracking module, inaccordance with an embodiment of the present invention; and

FIG. 3 is a high-level logic flow diagram of a method for preservingmemory affinity in the shared-memory multiprocessor system from FIG. 1,in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

Referring now to the drawings and in particular to FIG. 1, there isdepicted a block diagram of a shared-memory multiprocessor system havinga non-uniform memory access (NUMA) architecture, in which a preferredembodiment of the present invention may be implemented. As shown, a NUMAmultiprocessor system 10 includes a node 11 a, a node 11 b, a node 11 c,and a node 11 d. Each of nodes 11 a-11 d has at least one processorconnected to a local memory within the node via an intra-node connectionmechanisms such as a special bus or a crossbar switch. For example,multi-processor node 11 a contains processors P₁-P_(N) along with theirrespective cache memory connected to a main memory 13 a that is local toprocessors P₁-P_(N) via an intra-node bus 12. Each of nodes 11 a-11 dalso contain an input/output (I/O) unit, such as I/O unit 14 a withinnode 11 a, for supporting connections to various peripherals such asprinters, communication links, direct access storage devices, etc.

All nodes 11 a-11 d are interconnected by a Scalable CoherentInterconnect (SCI) 16. SCI 16 is a high-bandwidth interconnectionnetwork capable of providing cache coherence throughout NUMAmultiprocessor system 10. Each of nodes 11 a-11 d has a NUMA bridge,such as a NUMA bridge 15 a in node 11 a, to provide connections to SCI16 in order to maintain inter-nodal connection among nodes 11 a-11 d.

All processors within NUMA multiprocessor system 10 share an identicaladdressable main memory, which is distributed among nodes 11 a-11 d aslocal main memories 13 a-13 d. Because all local main memories 13 a-13 dare accessible by all the processors within NUMA multiprocessor system10, the total addressable main memory space within NUMA multiprocessorsystem 10 includes the combination of all local main memories 13 a-13 d.Each byte of system main memory can be addressable by a unique realaddress. The bus logic for each of nodes 11 a-11 d monitors all memoryaccesses by the processors and the I/O unit within a node and thendirects each local memory access to the node's local main memory. Remoteaccesses to a non-local main memory are sent to SCI 16 via a NUMA bridge15 within the requesting node.

Within the NUMA architecture, various multi-processor nodes can begrouped into different software partitions by an operating system via aprocess known as software partitioning, as it is well-known to thoseskilled in the relevant art.

As mentioned above, process migration can be utilized to perform loadbalancing and/or processor folding in order to control energyconsumption of a NUMA computer system such as NUMA multiprocessor system10 from FIG. 1. However, process migration may also contribute to lossof memory affinity that leads to system performance degradation. Thepresent invention reduces or even eliminates memory affinity loss due toprocess migration by restoring the proper memory affinity via dynamicpage migration.

In accordance with a preferred embodiment of the present invention, apage migration tracking module is utilized to manage process migration.The page migration tracking module keeps track of memory affinity accesspatterns to a physical memory.

With reference now to FIG. 2, there is depicted a block diagram of pagemigration tracking module, in accordance with an embodiment of thepresent invention. As shown, a page migration tracking module 20includes multiple entries 21. Each of entries 21 includes a real pageidentification (ID) field 22, a memory affinity ID field 23, an accesscounter field 24, and a status flag field 25. Memory affinity ID field23 contains an ID of a remote access. Access counter field 24 tracks thenumber of memory accesses to an associated page. Status flag field 25indicates whether an entry is valid (i.e., used or busy) or invalid(i.e., free). Entries 21 within page migration tracking module 20 can beorganized in a direct-mapped or set-associative manner.

A memory affinity domain is defined as a group of memories that are inphysical proximity, thus, any access to a memory within a memoryaffinity domain will experience identical memory access latency. Eachmemory affinity domain is preferably associated within a page migrationtracking module, such as page migration tracking module 20. Every time amemory access is made to a memory affinity domain, the associated pagemigration tracking module is checked based on the real page ID of thememory access. If a page located within a first memory affinity domainis found to be requested by processors associated with a second memoryaffinity domain on a relatively regular basis, then that page ismigrated from the first memory affinity domain to the second memoryaffinity domain.

In response to a memory access to a memory affinity domain by aprocessor, the real page ID and the memory affinity domain ID of therequesting processor are extracted from the address of the memoryaccess. For NUMA multiprocessor system 10 from FIG. 1, each of nodes 11a-11 d can be defined as one memory affinity domain, and each uniquenode ID can be utilized as a memory affinity domain ID accordingly.

Alternatively, in some computer systems, a processor has on-chip memorycontrollers for accessing its local memory, so a processor chip havingan associated off-chip local memory can be defined as one memoryaffinity domain. Thus, each unique processor chip ID of a processor canbe utilized as a memory affinity domain. Since the processor ID isreadily available from the memory access itself, so it should berelatively straight-forward to deduce the corresponding processor chipID.

Referring now to FIG. 3, there is illustrated a high-level logic flowdiagram of a method for preserving memory affinity in a NUMA dataprocessing system, such as NUMA multiprocessor system 10 from FIG. 1, inaccordance with an embodiment of the present invention. Starting atblock 30, in response to a request for memory access to a page within amemory affinity domain, a determination is made as to whether or not therequest is from a remote processor (i.e., a processor associated with adifferent memory affinity domain), as shown in FIG. 31. If the requestis not from a remote processor, then the request will be servedaccordingly, as depicted in block 40.

Otherwise, if the request is from a remote processor, then adetermination is made as to whether or not there is a real page ID matchin a real page ID field (such as real page ID field 22 from FIG. 2) of apage migration tracking module associated with the memory affinitydomain, as depicted in block 32. If there is no match in the real pageID field, then the real page ID field and a memory affinity field (suchas memory affinity ID field 23 from FIG. 2) of the least-recently-usedone of entries within the page migration tracking module will bereplaced by a new real page ID and a new memory affinity ID,respectively, as shown in blocks 33-34. In addition, its access counterfield (such as access counter field 24 from FIG. 2) will be reset to,for example, one, and the request will be served accordingly, asdepicted in block 40.

However, if there is a match in the page ID field, then a determinationis made as to whether or not there is a match in the memory affinity IDfield of the same entry with the page ID field match, as shown in block35. If there is no match in the memory affinity ID field of the sameentry with the page ID field match, then the memory affinity ID field ofthe same entry with the page ID field match will be replaced by a newmemory affinity ID, as depicted in block 34. In addition, its accesscounter field will be reset to, for example, one, and the request willbe served accordingly, as depicted in block 40. If there is a match inthe memory affinity ID field of the same entry with the page ID fieldmatch, then the access counter field of the same entry with the page IDfield match will be incremented, as depicted in block 36.

Regarding the process migration mechanism, the operating system needs tobe informed in order to process the page migration request. There aretwo possible embodiments, any one of which can generate a pagemigration. Preferably, upon an access counter reaching the predeterminedthreshold value, the hardware issues a hardware interrupt with the pageID of the page to migrate. The operating system then processes thehardware interrupt by migrating the page to the remote processor memorydomain. Alternatively, a separate hardware queue can be implemented inhardware to buffer multiple real page IDs. The operating system eitherpolls the queue on clock ticks or the hardware generated a hardwareinterrupt when there is at least one waiting request in the queue. Forthe present embodiment depicted in FIG. 3, the operating system isalerted that such page needs to be migrated to the remote processor andits memory affinity region, as depicted in block 38.

If the page migration request to operating system or to the queue is notaccepted due to operating system busy or queue full, the hardware simplydoes nothing; when the next remote request comes in, the hardware willsimply request a page migration again, effectively retrying the previouspage migration request. This behaves much like an in-place queuingmechanism as an extension to the waiting request queue. If the pagemigration request is accepted or enqueued, the entry within the pagemigration tracking module is freed by marking the same entry with thepage ID field match as invalid, as shown in block 39.

As has been described, the present disclosure provides a method forpreserving memory affinity in a NUMA data processing system. The presentinvention reduces and sometimes eliminates memory affinity loss due toprocess migration by restoring the proper memory affinity throughdynamic page migration. The memory affinity access patterns ofindividual pages are tracked continuously. If a particular page is foundalmost always to be accessed from a particular remote access affinitydomain for a certain number of times, and without any interveningrequests from other access affinity domain, the page will migrate tothat particular remote affinity domain so that the subsequent memoryaccess becomes local memory access. As a result, the proper pages aremigrated to increase memory affinity.

It is also important to note that although the present invention hasbeen described in the context of a fully functional computer system,those skilled in the art will appreciate that the mechanisms of thepresent invention are capable of being distributed as a program productin a variety of recordable type media such as compact discs and digitalvideo discs.

While the disclosure has been particularly shown and described withreference to a preferred embodiment, it will be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the disclosure.

1. A method for preserving memory affinity in a non-uniform memoryaccess data processing system, said method comprising: in response to arequest for memory access to a page within a first memory affinitydomain, determining whether or not said request is initiated by aprocessor associated with said memory affinity domain; in response tothe determination that said request is not initiated by a processorassociated with said memory affinity domain, determining whether or notthere is a page ID match with an entry within a page migration trackingmodule associated with said memory affinity domain; in response to thedetermination that there is no page ID match with any entry within saidpage migration tracking module, selecting an entry within said pagemigration tracking module and providing said entry with a new page IDand a new memory affinity ID; in response to the determination thatthere is a page ID match with an entry within said page migrationtracking module, determining whether or not there is a memory affinityID match with said entry with the page ID field match; in response tothe determination that there is no memory affinity ID match, updatingsaid entry with the page ID field match with a new memory affinity ID;and in response to the determination that there is a memory affinity IDmatch, incrementing an access counter of said entry with the page IDfield match.
 2. The method of claim 1, wherein said method furtherincludes in a determination that said request is initiated by aprocessor associated with said memory affinity domain, serving saidrequest.
 3. The method of claim 1, wherein said selecting furtherincludes resetting an access counter of said entry.
 4. The method ofclaim 1, wherein said updating further includes resetting an accesscounter of said entry.
 5. The method of claim 1, wherein said memoryaffinity domain is defined as a group of memories that are in physicalproximity, wherein any access to a memory within said memory affinitydomain experiences identical memory access latency.
 6. The method ofclaim 1, wherein each memory affinity domain is associated within a pagemigration tracking module.
 7. The method of claim 1, wherein saidselecting further includes selecting a least-recently-used entry.
 8. Themethod of claim 7, wherein said method further includes marking saidentry as invalid.
 9. The method of claim 1, wherein said memory affinitytracking module includes a real page ID field, a memory affinity IDfield, and an access counter field.
 10. The method of claim 1, whereinsaid method further includes alerting an operating system a page needsto be migrated to a remote processor and its memory affinity region.